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In this paper we study the identification of boosted hadronically decaying top quarks using jet 
substructure in the center-of-mass frame of the jet. We demonstrate that the method can greatly 
reduce the QCD jet background while maintaining high identification efficiency of the boosted top 
quark even in a very high pileup condition. Applications to searches for heavy resonances that decay 
to a tt final state are also discussed. 
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I. INTRODUCTION 

Search for and discovering new physics (NP) beyond 
the standard model (SM) has been one the of main 
physics motivations to build the Large Hadron Collider 
(LHC) . One prominent path to find NP is through model 
independent searches for possible new particles beyond 
the SM. Many extensions of the SM predict new heavy 
resonances with masses at the TeV scale. Some of these 
heavy resonances, such as a new heavy gauge boson Z' 
or Kaluza-Klein (KK) gluons from the bulk Randall- 
Sundrum model, can predominantly decay to a top-anti- 
top quark pair {ti) final state Because of the 
energy scale of these processes, the top quarks from 
the heavy resonance decay receive a significant Lorentz 
boost. Their hadronically decaying products are usually 
so collimated that they can be only reconstructed as sin- 
gle jets in the experiments. 

In recent years, many theoretical studies have been 
performed to investigate the signature of boosted hadron- 
ically decaying top quarks |5l4l6l|. where a hadronically 
decaying top is defined as the top quark whose W bo- 
son daughter decays hadronically. Several experimental 
searches for heavy new resonances decaying to tt final 
states have also been carried out by the ATLAS and CMS 
experiments at the LHC [13, [3 • The measurements ex- 
clude a production of such a new heavy resonance with a 
mass up to 1-2 TeV, depending on the NP model used for 
the theoretical interoperation of the results. In all those 
studies and measurements, the complete final state of the 
top quark decay is reconstructed as a single jet, hereafter 
referred as t jet. The invariant mass of the reconstructed 
jet (wjet) is used to distinguish the t jets from QCD jets, 
where the QCD jets are defined as those jets initiated 
by a non-top quark or gluon. Since the jet mass alone 
may not provide sufficient discriminating power to effec- 
tively separate t jets from the overwhelming QCD back- 
ground in many analyses, techniques based on jet sub- 
structure information, such as jet shape observables p^ . 
filtering [20] , pruning [2l|, [2^ and trimming , are typ- 
ically implemented as additional experimental handles to 
identify boosted hadronically decaying t quarks. 

In our previous paper [23|, we introduce a new ap- 
proach to study jet substructure in the center-of-mass 



frame of the jet. We demonstrated that it can be used 
to discriminate the boosted heavy particles from the 
QCD jets and the method is complementary to other 
jet substructure algorithms. A similar idea has also been 
explored to search for hadronically decaying Higgs bo- 
son [2^. In this paper, we will extend the studies pre- 
sented in Ref. [2J| to focus on identifying the boosted 
hadronically decaying top quark in the center-of-mass 
frame of the jet. We demonstrate that the method can 
greatly reduce the QCD jet background while maintain- 
ing a high identification efficiency of the boosted top 
quark even in the environment of a very large number 
of multiple interactions per event (pileup). Using an ex- 
ample of applications, we show a good prospective on 
search for heavy mass particles in the ti decay channel 
at the LHC. 

This paper is organized as follows: In Section |TT1 we 
describe the event sample used in the study. Section Hill 
discusses the method to identify t jets using jet substruc- 
ture in the jet center-of-mass frame and its performance. 
An example of the application of our method is given in 
Section HVl We conclude in Section IVl 



II. EVENT SAMPLE 

We use boosted t jets, from the SM process of tt pro- 
duction, as a benchmark to study the identification of 
boosted hadronically decaying top quark with our pro- 
posed jet substructure method. For simplicity we only 
consider the background from the SM dijet production 
since its cross section is several orders of magnitudes 
larger than those of other SM background. In addition, 
we also generate events to simulate a heavy-particle X 
that decays to a tt final state. 

All the events used in this anal ysis are produced using 
the Pythia 6.421 event generator [20| for the pp collision 
at 14 TeV center-of-mass energy. In order to simulate the 
finite resolution of the Calorimeter detector at the LHC, 
we divide the (77, (f) plane into 0.1 x 0.1 cells. We sum over 
the energy of particles entering each cell in each event, 
except for the neutrinos and muons, and replace it with a 
massless pseudoparticle of the same energy, also referred 
to an energy cluster, pointing to the center of the cell. 
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These pseudoparticles are fed into the FastJet 3.0.1 [27[ 
package for jet reconstruction. The jets are reconstructed 
using the anti- /ct algorithm with a distance parame- 
ter of AR = 0.6. The anti-/cT jet algorithm is the default 
one used at the ATLAS and CMS experiments. In or- 
der to evaluate the performance of top jet identification 
with the currently expected experimental conditions at 
the LHC, we generate MC events with different average 
numbers of multiple interactions per event [26| and then 
repeat our studies for each scenarios. We compare the re- 
sults to the one in the ideal experimental condition that 
has no pileup. 



III. JET SUBSTRUCTURE IN THE REST 
FRAME 

In this section we describe the method to study jet 
substructure of the t jets in the center-of-mass frame of 
the jet in order to distinguish them from the QCD jets. 
We select jets with px > 600 GeV and \ri\ < 1.9 as i jet 
candidates, where and rj are the transverse momentum 
and pseudorapidity of the jet. We further require that 
the t jet candidates have 50 GeV < mjot < 350 GeV. In 
case there are more than one candidate in an event, all 
of them are kept for further analysis. 



A. Center-of-mass frame of a jet 

We define the center-of-mass frame (rest frame) of a jet 
as the frame where the four momentum of the jet is equal 



to p'^ 



0, 0, 0). A jet consists of its constituent 



particles. The distribution of the constituent particles of 
a boosted t jet in its center-of-mass frame has a three 
body decay topology as in the top quark rest frame. On 
the other hand, the constituent particle distribution of 
a QCD jet in its rest frame does not correspond to any 
physical state and is more likely to be isotropic. 



more than 70 % (90 %) of the QCD jets have at least 3 
subjets with Ej^t > 10 GeV. 

The mass of a jet has a large dependence on the pileup 
condition, its distribution shifts to higher values when 
the pileup condition gets worse. However, this effect is 
greatly reduced if we calculate the mass of a jet using its 
subjets in the center of mass frame of the jet, as shown 
in Fig.[TJ Because of the smaller area covered by its cone 
size and the isotropic distribution of the constituent par- 
ticles of QCD jets from pileup in the jet rest frame, a 
reconstructed subjet in the jet rest frame includes much 
less deposited energies from additional multiple interac- 
tions. One advantage of reclustering in the jet rest frame 
is that two nearby energy clusters that have different mo- 
menta in the lab frame can be easily separated geomet- 
rically after they are boosted back to the center-of-mass 
frame of the jet if they are originated by different par- 
tons from top decaying. We also point out that the rest 
frame subjet algorithm is infrared and collinear safe if an 
infrared and collinear safe jet algorithm is used for the 
rest frame subjet clustering. All the sophisticated jet- 
grooming algorithms introduced in the lab frame, such 
as pruning [2ll . [2^ and trimming 23 1 , can be easily in- 
corporated. 
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FIG. 1: The jet mass distributions of the QCD jets (solid line) 
and signal t jets (dashed line) of the SM tt production from 
MC simulated events before (left) and after (right) recluster- 
ing in the jet rest frame under different pileup conditions. The 
mass of the jet after the reclustering is calculated using the 
three subjets with the highest energies in the jet rest frame. 
All the distributions are normalized to unity. 



B. Reclustering in the jet reset frame 



We recluster the energy clusters of a jet to reconstruct 
subjets in the jet rest frame. The reclustering is done by 
using the Cambridge-Aachen (CA) sequential jet recon- 
struction algorithm (29j with a modified distance param- 
eter of A6 = 0.6, where 9 is defined as the angle between 
two pseudoparticles in the jet rest frame. We reject jets 
that have less than 3 subjets with Ej^t > 10 GeV, where 
Ejct is the energy of a subjet in the jet reset frame. In 
the ideal situation without any pileup effects, this re- 
quirement rejects roughly 60% of the QCD jets, while 
keeping almost all the signal t jets. However, the rejec- 
tion power drops significantly when the average numbers 
of multiple interactions per event increases. When the 
average pileup at the LHC reaches 50 (100) per event. 



C. Jet substructure variables 

We introduce several jet substructure variables to iden- 
tify boosted t jets. All of them are calculated using the 
subjet information in the jet rest frame. They are: 

• Ej^, Ej^ and Ej^: Ej. is the energy of the subject 
ji in the center-of-mass frame of the jet. The ji, 
j2 and j3 denote the subjets with the first, second 
and third highest energy. 

• mw'- it is defined as mj-^_j^, the invariant mass of 
the combination of subjet ji and ja in the jet rest 
frame. 
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FIG. 2: The distributions of the jet substructure variables defined in the text for the MC simulated QCD jet background (solid 
line) and t jet signal (dashed line) from the SM production under different pileup conditions. The jets used in the evaluation 
of the jet substructure performance are required to have pT > 600 GeV, 50 GeV < mjet < 350 GeV and at least 3 subjets with 
Ejct > 10 GeV in its rest frame. All the distributions are normalized to unity. 




FIG. 3: The background rejection of QCD jets vs. the signal efficiency of t jets for the jet substructure variables in different 
pileup conditions. The jets used in the evaluation of the jet substructure performance are required to have pt > 600 GeV, 
50 GeV < TTijct < 350 GeV and at least 3 subjets with Ejct > 10 GeV in its rest frame. 
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• 'm{jj)raa,^ and m(jj)min: they are defined as 
m{jj)n,i,x = max(mji j2,TOj2j3) and m(jj)mi„ = 

• Asymmetry of the energy Ae'. it is defined as A^; — 
{Ew — E-^^)/{Ew + Ej^), where Ew is the sum of 
the energies Ej^ and Ej^. 

• AO: it is defined as the opening angle between 
Pji +-Pj3 s-i^d pj2, where pj. , i ^ 1,2 and 3, is the 
momentum of the subjet ji in the jet rest frame. 

The distributions of the jet substructure variables are 
shown in Fig. [5] for t jet signal and QCD jet background 
under different pileup conditions. The corresponding sig- 
nal efficiencies of t jets vs. the background rejections of 
QCD jets are shown in Fig. |31 We see a significant dif- 
ference between the signal and background distributions 
for those jet substructure variables. All of them show 
some dependence on the pileup conditions. The effects 
are slightly less for the t jet signal than the QCD jet back- 
ground. As shown in Fig. [31 although the performance of 
the jet substructure variables varies with respect to dif- 
ferent average number of multiple interaction per event, 
it does not show any significant degradation of its rejec- 
tion power of the QCD jet background while retaining 
the same t jet signal identification efficiency. 

D. Boosted decision tree top tagger 
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FIG. 4: The background rejection of QCD jets vs. the signal 
efficiency of t jets for the boosted decision tree top tagger in 
different pileup conditions. The jets used in the evaluation of 
the tagger performance are required to have pt > 600 GeV, 
50 GeV < mjct < 350 GeV and at least 3 subjets with Ejot > 
10 GeV in its rest frame. 



Most jet substructure variables that we introduced in 
previous section are strongly correlated with each other. 



In order to unitize the maximum discriminating pow- 
ers of those variables, multivariable analysis techniques, 
such as neural network, boosted decision tree (BDT), etc, 
are typically necessary to combine their informations. In 
this section, we demonstrate such an application by con- 
structing a BDT variable using the following jet substruc- 
ture quantities: Ej^, E;^, Ej^, mw, rn{jj)^^x, m{jj)^in, 
Ae and A8. Note that the signal and background sepa- 
ration power of the variables A e and AQ is not as good as 
the others, as shown in Fig. [5] and [31 However, their cor- 
relations with the other variables are relatively small (less 
than 30 % in most cases). As a result, we keep them in the 
BDT algorithm to add additional discriminating power. 
We study the BDT algorithm using MC events generated 
with different average number of multiple interactions per 
event and optimize their performances separately. The 
signal efficiency of t jets vs. the background rejection 
of QCD jets for the BDT variable is shown in Fig. [H 
Regardless of the pileup conditions, the jet substructure 
variables we proposed can easily reduce the contribution 
of the QCD jet background by approximately 100, with 
only a factor of three reduction for the t jet signal identifi- 
cation efficiency. In principle, we expect that the tagger 
performance gets worse when the pileup increases. As 
shown in Fig. [H the performance of the tagger is actu- 
ally better with higher pileups for several ranges of the 
signal efficiencies. This is an artificial effect that is caused 
by the selection of jets used in the evaluation of the tag- 
ger performance. In our studies, we only use jets that 
have PT > 600 GeV, 50 GeV < mjet < 350 GeV and 
at least 3 subjets with Ejct > 10 GeV in its rest frame. 
As a result, when pileup increases, many QCD jets that 
otherwise would not satisfy the jet selection criteria are 
selected. 

Comparing to the performance of other existing top 
taggers [s^l, the BDT top tagger based on jet substruc- 
ture in the jet rest frame has similar background rejec- 
tion for a given signal identification efficiency of boosted 
hadronically decaying top quarks. In our studies, the jets 
are reconstructed using a distance parameter AR = 0.6 
that is different from many other studies. We also con- 
sider the pileup conditions in extreme cases, which is 
absence in many of performance studies of existing top 
taggers. Our results show that top tagger based on jet 
substructure in the jet rest frame is complementary to 
other top tagger tools. 



IV. APPLICATION 

Reconstructed decays of top quarks to jets can be 
used to search for NP with specific final state signatures. 
Here we demonstrate such an application by consider- 
ing a heavy resonance that decays to a tt final state: 
pp ^ X ^ ti, where A is a new heavy resonance beyond 
the SM, such as a new heavy gauge boson Z' or a KK 
gluon in the Randall-Sundrum model, etc. 

We consider a search for an X signal by fully recon- 
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structing the X signal candidate in the decay mode where 
both top quarks decay hadronically. Note that for such 
high mass (> 2.0 TeV) resonance decays, more than 90 % 
of the events have their decay particles from the top decay 
within a cone of R < 0.6. This makes it very difficult to 
identify separate subjets from top quark decays in the lab 
frame. As a result, we select the two leading jets with the 
highest and second highest energy with px > 600 GeV 
and \r]\ < 1.9 in an event as the two hadronically de- 
caying top quark candidates. The jets are reconstructed 
using the anti-fcx algorithm with a distance parameter 
AR = 0.6. We then subsequently combine the two top 
jet candidates to form the candidate of the X ^ ti reso- 
nance. 




is mx = 4 TeV with a natural width of 100 GeV. The 
major SM backgrounds are QCD events and tt produc- 
tion. In order to reduce the background, we apply the 
hadronic top jet identification using the BDT top tagger 
described in Section fill Dl The selection criterion of the 
BDT variable is optimized to have approximately 40 % 
identification efficiency of a top jet, while keeping the 
fake rate of QCD jets at less than 2 %. As the result, 
we are able to reduce the SM background by a factor of 
more than 2000, while keeping roughly 15 % of the signal 
events. As shown in Fig. [SJ such a selection dramatically 
improves the signal over background ratio and makes it 
possible for us to observe a potential X ^ tt signal if the 
product of its production cross section and the branching 
fraction for its decay into a tt pair is of the order of a few 
tens of fb. 

We repeat the study for heavy resonance X with dif- 
ferent masses under various pileup conditions. We esti- 
mate the expected 95% C.L. upper limit on the product 
of the production cross section of a heavy resonance X 
and the branching faction for its decay into a tt pair. 
The expected limit for 100 fb"^ of LHC data at 14 TeV 
center-of-mass energy is plotted as a function of the as- 
sumed X mass, as shown in Fig. [51 All the studies show 
similar results. The BDT top tagger based on the jet 
substructure information in the jet rest frame can signif- 
icantly improve our experimental sensitivities in search 
for pp X tt. The performance of the top tagger de- 
creases when the average number of multiple interactions 
per event increases. However, the degradation is rather 
small. Even in the case of an extreme pileup condition 
with 100 multiple interactions per event, the experimen- 
tal sensitivity drops by less than a factor of 2 compared 
to the one in an ideal experiment with zero pileup. 
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FIG. 5: The invariant mass distribution of the X ^ tt can- 
didates in the MC simulated event sample that is equivalent 
to 100 fb~^ of LHC data at 14 TeV center-of-mass energy be- 
fore (top) and after (bottom) the BDT top tagger is applied. 
The MC events used to make the plot are generated with 
an assumption of 50 multiple interactions per event on aver- 
age. The open histogram is the background distribution from 
QCD events, the hatched histogram is the background contri- 
bution from the SM ti production, and the dashed histogram 
is the expected signal distribution from pp — > X — > ti. Here 
we assume a width of 100 GeV of the heavy resonance with 
mx = 4 TeV, and the product of the production cross section 
of X and the branching fraction for its decay into tt pair is 
10 fb. 



The invariant mass distributions of the X ^ tt can- 
didates in the MC simulated event sample that is equiv- 
alent to 100 fb~^ of LHC data at 14 TeV center-of-mass 
energy are shown in Fig. [5j Here we consider that the 
average pileup is 50 and the mass of the heavy resonance 
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FIG. 6: The expected 95 % C.L. upper limit on the product 
of the production cross section of a heavy resonance X and 
its decaying branching faction into a tt pair, as a function of 
the assumed X mass under different pileup conditions. Here 
we assume a width of 100 GeV of the heavy resonance and 
100 fb~^ of LHC data at 14 TeV center-of-mass energy. 
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V. CONCLUSION 

In this paper we study the identification of a boosted 
hadronically decaying top qiiark using jet siibstriicturc 
in the center-of-mass frame of the jet. We demonstrate 
that the method can greatly reduce the QCD jet back- 
ground while maintaining high identification efficiency of 
the boosted hadronically decaying top quarks even in 
a very high pileup condition. The study shows a good 
prospective on search for heavy mass particles in the tt 
decay channel for the LHC experiments at 14 TeV center- 
of-mass energy. 
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